DO NOT DISTRIBUTE Learning Response Time for WebSources using Query Feedback and Application in Query Optimization

نویسندگان

  • Jean-Robert Gruser
  • Louiqa Raschid
  • Vladimir Zadorozhny
  • Tao Zhan
چکیده

The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current wrapper mediator architectures need to be extended with a Wrapper Cost Model (WCM) for WebSources that can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper, we present a Web Prediction Tool (WebPT), a tool that is based on learning using query feedback from WebSources. The WebPT uses dimensions Time of day, Day, and Quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay) for some query. Experiment data was collected from several sources, and those dimensions that were signiicant in estimating the response time were determined. We then trained the WebPT on the collected data, to use the three dimensions mentioned above, and to predict the response time, as well as a conndence in the prediction. We describe the WebPT learning algorithms, and report on the WebPT learning for WebSources. Our research shows that we can improve the quality of learning by tuning the WebPT features, e.g., training the WebPT using a logarithm of the input training data; including signiicant dimensions in the WebPT; or changing the ordering of dimensions. A comparison of the WebPT with more traditional neural network (NN) learning has been performed, and we brieey report on the comparison. We then demonstrate how the WebPT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identiies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaaected by the delay. We explore the space of real delay at a WebSource, versus the WebPT prediction of this delay, with respect to critical points of delay in speciic plans. We identify those cases where WebPT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and WebPT learning, we test how good the WebPT is in minimizing these penalties. Data-intensive applications on the Web Query languages and systems for Web data (perhaps) Data integration over the Web (perhaps)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Validating a Cost Model for Wide Area Applications

Recent technology advances have enabled wide area application (WAA) processing with WebSources that are accessible via the WWW. One challenge to query processing in wide area environments is the unpredictable behavior of WebSources in the dynamic WAN. There can be wide variability in the latency (delay) of accessing these sources, and the delay could depend on the network and server workloads. ...

متن کامل

Web pages ranking algorithm based on reinforcement learning and user feedback

The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Relational Databases Query Optimization using Hybrid Evolutionary Algorithm

Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...

متن کامل

Learning Response Times for WebSources: A Comparison of a Web Prediction Tool (WebPT) and a Neural Network

The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current wrapper mediator architectures need to be extended with a Wrapper Cost Model (WCM) for WebSources that can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper, we present a Web Prediction Tool (W...

متن کامل

Query Optimization to Meet Performance Guarantees for Wide Area Applications

Recent technology advances have enabled mediated query processing with Internet accessible WebSources. A characteristic of WebSources is that their access costs exhibit transient behavior. These costs depend on the network and server workloads, which are often affected by the time of day, day, etc. Given a distribution of access costs, a potential optimization strategy is based on the least exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999